markov decision process and informativeness
Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. By analyzing the change in the maximum expected hitting cost, this work presents a formal understanding of the effect of potential-based reward shaping on regret (and sample complexity) in the undiscounted average reward setting. We further establish that shaping can reduce or increase MEHC by at most a factor of two in a large class of MDPs with finite MEHC and unsaturated optimal average rewards.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
This paper introduces a new complexity measure for MDPs called maximum expected hitting cost. Unlike the diameter measure is only a function of the transition dynamics, this new measure takes into account the reward dynamics as well. The authors show theoretically that under the same assumptions as previous authors who introduced diameter, this new measure is a tighter upper bound. Furthermore, they show the usefulness of this measure by showing that it can be used to better understand the informativeness of rewards when using potential based reward shaping and they prove theoretically that in a large class of MDPs potential based reward shaping is bounded by a multiplicative factor of 2 on their maximum expected hitting costs. I enjoyed reading this paper. I appreciated the structure that the authors used in this paper which first introduced all the necessary prior work (related to diameter) cosily but thoroughly enough before introducing their contributions.
Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
The paper introduces a new complexity measure for MDPs, the expected hitting costs. In contrast to former complexity measures, the hitting costs also depend on the reward of the MDP and can provide a tighter bound for UCRL2. The theory also provides an intersting connection between reward shapeing and the complexity of a MDP. All reviewers appreciated the strong theoretical contribution of the paper which improves our theoretical understanding of the complexity of MDPs. The reviewers also liked that the paper is well written and establishes connections to reward shaping, a method that has also a highly practical value. All reviewers recommend acceptance and I agree with their assessment.
Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. By analyzing the change in the maximum expected hitting cost, this work presents a formal understanding of the effect of potential-based reward shaping on regret (and sample complexity) in the undiscounted average reward setting.
Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. By analyzing the change in the maximum expected hitting cost, this work presents a formal understanding of the effect of potential-based reward shaping on regret (and sample complexity) in the undiscounted average reward setting.
Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
Dai, Falcon Z., Walter, Matthew R.
We propose a new complexity measure for Markov decision processes (MDP), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. We further establish that shaping can reduce or increase MEHC by at most a factor of two in a large class of MDPs with finite MEHC and unsaturated optimal average rewards.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)